"Ten common statistical mistakes to watch out for when writing or reviewing a manuscript" (by Tamar Makin and Jean-Jacques Orban de Xivry, 9 Oct 2019) suggests a set of pitfalls to beware of. Their abstract summarizes:
Inspired by broader efforts to make the conclusions of scientific research more robust, we have compiled a list of some of the most common statistical mistakes that appear in the scientific literature. The mistakes have their origins in ineffective experimental designs, inappropriate analyses and/or flawed reasoning. We provide advice on how authors, reviewers and readers can identify and resolve these mistakes and, we hope, avoid them in the future.
And in brief, their list:
- Absence of an adequate control condition/group
... for any studies looking at the effect of an experimental manipulation on a variable over time, it is crucial to compare the effect of this experimental manipulation with the effect of a control manipulation. ...
- Interpreting comparisons between two effects without directly comparing them
... rather than running two separate tests, it is essential to use one statistical test to compare the two effects ...
- Inflating the units of analysis
... If a study aims to understand group effects, then the unit of analysis should reflect the variance across subjects, not within subjects. ...
- Spurious correlations
... Spurious correlations most commonly arise if one or several outliers are present for one of the two variables. ...
- Use of small samples
... When a sample size is small, one can only detect large effects, thereby leaving high uncertainty around the estimate of the true effect size and leading to an overestimation of the actual effect size ...
- Circular analysis
... recycling the same data to first characterise the test variables and then to make statistical inferences from them, and is thus often referred to as 'double dipping' ...
- Flexibility of analysis: p-hacking
... the more variation in one's analysis pipeline, the greater the likelihood that observed effects are not genuine ...
- Failing to correct for multiple comparisons
... When performed with frequentist statistics, conducting multiple comparisons during exploratory analysis can have profound consequences for the interpretation of significant finding ...
- Over-interpreting non-significant results
... a non-significant p-value does not distinguish between the lack of an effect due to the effect being objectively absent (contradictory evidence to the hypothesis) or due to the insensitivity of the data to enable to the researchers to rigorously evaluate the prediction (e.g. due to lack of statistical power, inappropriate experimental design, etc.). In simple words - non-significant effects could literally mean very different things - a true null result, an underpowered genuine effect, or an ambiguous effect ...
- Correlation and causation
... Just because variability of two variables seems to linearly co-occur does not necessarily mean that there is a causal relationship between them, even if such an association is plausible. For example, a significant correlation observed between annual chocolate consumption and number of Nobel laureates for different countries (r(20)=.79; p<0.001) has led to the (incorrect) suggestion that chocolate intake provides nutritional ground for sprouting Nobel laureates ...
(cf Small Number Illusions (1999-08-04), Correlations and Causality (2000-04-09), Science and Pseudoscience (2001-10-06), Square Root of Baseball (2005-05-13), Statistics - A Bayesian Perspective (2010-08-13), Medicine and Statistics (2010-11-13), Introduction to Bayesian Statistics (2010-11-20), Doing Bayesian Data Analysis (2013-11-02), Probability Theory, the Logic of Science (2013-11-18), Statistical Hypothesis Inference Testing (2013-12-01), P-Hacking (2014-09-20), Causal Inference in Statistics (2018-09-16), ...) - ^z - 2019-10-14